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Abstract.  Superquantile  risk,  also  known  as  conditional  value-at-risk  (CVaR),  is  widely  used  as 
a  coherent  measure  of  risk  due  to  its  improved  properties  over  those  of  quantile  risk  (value-at-risk). 
In  this  paper,  we  consider  second-order  superquantile/CVaR  measures  of  risk,  which  represent  further 
“smoothing”  by  averaging  the  classical  quantities.  We  also  step  further  and  examine  the  more  general 
“mixed”  superquantile/CVaR  measures  of  risk  with  fundamental  importance  in  dual  utility  theory.  We 
establish  representations  of  these  mixed  and  second-order  superquantile  risk  measures  in  terms  of  risk 
profiles,  risk  envelopes,  and  risk  identifiers.  The  expressions  facilitate  the  development  of  dual  meth¬ 
ods  for  mixed  and  second-order  superquantile  risk  minimization  as  well  as  superquantile  regression,  a 
second-order  version  of  quantile  regression. 
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1  Introduction 

The  question  of  how  to  assess  and  rank  uncertain  quantities  represented  by  random  variables  takes  center 
stage  in  many  areas  of  operations  research,  engineering,  and  economics.  The  axiomatic  framework  of 
coherency  laid  out  in  [2]  provides  guidance  for  constructing  measures  of  risk  that  quantify  the  “risk” 
in  a  random  variable.  Conditional  value-at-risk  (CVaR)  [17,  18],  also  called  superquantile  risk?,  is  a 

1This  material  is  based  upon  work  supported  in  part  by  the  U.  S.  Air  Force  Office  of  Scientific  Research  under  grants 
FA9550-1 1-1-0206  and  F1ATA01194G001. 

2The  quantity  originally  proposed  under  the  name  conditional  value-at-risk  is  also  called  average  value-at-risk  and 
expected  shortfall.  With  the  increasing  number  of  applications  beyond  finance  and  the  need  for  treating  conditional 
random  variables,  however,  the  name  “superquantile  risk”  seems  more  appropriate. 
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type  of  coherent  risk  measure  having  importance  in  its  own  right,  but  central  also  as  building  block 
for  all  law-invariant  coherent  risk  measures  [9,  7,  13,  11,  26].  In  particular,  the  weighted  average  of 
superquantile  risk  measures  across  probability  levels  gives  rise  to  mixed  superquantile  risk  measures , 
also  called  spectral  risk  measures  [1]  and  Choquet  representation  of  distortion  acceptability  functionals 
[12],  that  are  appealing  to  practitioners.  In  fact,  such  risk  measures  correspond  to  a  class  of  utility 
functions  of  dual  utility  theory  [28];  see  [12,  6,  24]  for  details.  Properties  of  these  and  other  “mixed” 
risk  measures  are  clarified  further  in  the  Mixing  Theorem  of  [19]. 

In  this  paper,  we  are  motivated  by  emerging  applications  of  second-order  superquantiles,  especially 
in  risk-averse  regression  [16].  A  second-order  superquantile  (or  second-order  CVaR)  is  the  normalized 
integral  of  superquantiles  (CVaRs)  with  respect  to  the  probability  level.  In  that  sense,  second-order 
superquantiles  are  particular  instances  of  mixed  superquantiles.  As  shown  in  [14,  15,  16]  and  summa¬ 
rized  below,  a  second-order  superquantile  of  a  random  variable  arises  from  a  certain  “smoothing”  of 
its  distribution  function  such  that  quantiles  of  the  smoothed  distribution  function  coincide  with  the 
superquantiles  of  the  original  distribution  function.  In  the  same  manner  as  a  superquantile  risk  (CVaR) 
is  more  conservative  and  mathematically  better  behaved  than  the  corresponding  quantile  risk  (value- 
at-risk),  second-order  superquantile  risk  (second-order  CVaR)  is  more  conservative  and  better  behaved 
than  the  corresponding  superquantile  (CVaR).  (The  higher-order  CVaR  introduced  in  [8]  and  studied 
further  in  [5]  is  unrelated  to  our  development.)  A  particular  application  of  second-order  superquantiles 
is  in  the  domain  of  generalized  regression.  We  laid  out  in  [16]  a  parallel  methodology  to  that  of  quantile 
regression,  which  instead  of  estimating  conditional  quantiles,  estimates  conditional  super  quantiles.  The 
resulting  estimation  problem  is  essentially  a  second-order  superquantile  minimization  problem. 

Although  second-order  superquantiles  serve  as  a  primary  motivation,  little  additional  complication 
derives  from  considering  the  broader  class  of  general  mixed  superquantile  risk  measures,  so  we  proceed 
in  that  setting.  Properties  of  second-order  superquantiles  then  follow  as  corollaries. 

The  contributions  of  the  paper  are  as  follows.  We  establish  representations  of  mixed  and  second- 
order  superquantiles  in  terms  of  risk  profiles  by  extending  results  in  [20,  21] .  We  provide  detail  character¬ 
ization  of  risk  envelopes  of  mixed  and  second-order  superquantile  risk  measures  as  well  as  corresponding 
risk  identifiers  that  furnish  maximizing  change-of-measure  in  dual  representations  of  such  risk  measures. 
These  expressions  facilitate  the  development  of  dual  methods  for  mixed  and  second-order  superquantile 
risk  minimization  as  well  as  for  superquantile  regression,  the  second-order  version  of  quantile  regression. 

Although  dualization  of  risk  measures  can  be  carried  out  for  a  variety  of  spaces  of  random  variables 
and  paired  dual  spaces  (see  for  example  [25]),  we  here  focus  on  random  variables  with  finite  second 
moments.  In  addition  to  the  fact  that  this  choice  results  in  a  “balance”  between  the  original  space  of 
random  variables  and  a  paired  dual  space,  which  then  can  be  selected  to  be  the  same  space,  random 
variables  with  finite  second  moment  are  also  guaranteed  to  have  finite  superquantiles  for  any  probability 
level  less  than  1  as  demonstrated  in  [16].  Consequently,  we  are  able  to  guarantee  finiteness  of  second- 
order  superquantile  risk  measures  along  with  a  specific  condition  for  finiteness  of  mixed  superquantile 
risk  measures. 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  gives  background.  Section  3  presents 
definitions  of  mixed  and  second-order  superquantiles  as  well  as  basic  properties.  Section  4  provides 
dual  characterizations  of  mixed  and  second-order  superquantiles.  Section  5  discusses  the  application  of 
the  preceding  results  in  risk  optimization  and  superquantile  regression  problems. 
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2  Background 

For  a  probability  space  (fl,  F,  P),  we  let 

C2  =  C2(fl,F,  P)  :=  {X  :  fl  ^  JR  \  X  ^-measurable,  E[X 2]  <  00} 

be  the  space  of  random  variables  with  finite  second  moment,  where  we  write  integration  with  respect 
to  P  using  the  standard  notation  E[X]  =  f  X(u;)dP(u>).  We  equip  C2  with  the  standard  norm 

||X||2  :=  ( E[X 2])1/2. 

In  the  following,  we  deal  with  classes  of  measures  of  risk  defined  on  C2 .  Regularity  [19]  provides 
fundamental  properties  for  such  risk  measures.  We  recall  that  a  measure  of  risk  7Z  :  C2  — >•  (—00, 00]  is 
regular  if  it  satisfies  the  following  axioms: 

7Z(X)  =  c  for  constant  random  variables  X  =  c, 

1Z{{1  —  t)X  +  tX')  <  (1  —  t)'JZ(X)  +  tTZ(X')  for  all  X,  X1  €  C2  and  r  G  (0, 1)  (convexity), 

{X  G  C2  |  TZ(X)  <  c}  is  closed  for  all  c  £  M  (closedness), 

TZ(X)  >  E[X\  for  nonconstant  lef’  (averseness). 

We  say  that  a  risk  measure  1Z  is  positively  homogeneous  if 

1Z(tX)  =  t7Z(X)  for  r  >  0,X  G  C2 . 


and  monotonic  if 

7 Z(X)  <  7 Z(Y)  whenever  X(u ;)  <  Y(u)  for  a.e.  u ;  G  fh 

We  characterize  the  distribution  of  an  X  G  £2  by  its  right-continuous,  nondecreasing  cumulative 
distribution  function 

Fx(x)  :=  P({w  G  |  X(ui)  <  x }),  iGfi. 

Equivalently,  it  can  be  characterized  by  the  left-continuous,  nondecreasing,  finite-valued  quantile  func¬ 
tion 

Gx{ot)  :=  min{x  G  1R  \  Fx( x)  >  a},  a  G  (0, 1), 

or  by  the  continuous,  nondecreasing  first-order  superquantile  function  Gx  ■  [0, 1]  — >•  (—00,00],  where 

Gx(a)  :=  1  f1  Gx(/3)dp,  a  G  [0, 1),  (1) 

1  -a  Ja 


and  Gx  ( 1 )  :=  sup  X  (the  essential  supremum).  We  include  the  prefix  “hrst-order”  to  distinguish  the 
superquantile  function  from  the  subsequent  development  of  a  second-order  theory.  Since  Gx  ■  (0, 1)  — >• 
]R  is  discontinuous  at  most  for  a  countable  number  of  points  in  (0, 1),  the  integral  is  well-defined.  We 
observe  that  GV(0)  =  E[X\  and  for  nonconstant  X  G  £2,  Gx  is  strictly  increasing. 

An  alternative  expression  for  Gx(oi),a  G  [0, 1),  is  furnished  by  (see  [18]) 


x  dF%{x),  with  F%(x)  := 


Fxi-a  a  if  Fx{x)  >  a 
if  Fx(x)  <  a. 


3 


0 


(2) 


The  quantity  Gx(ct )  can  therefore  be  interpreted  as  a  conditional  expectation  of  X  given  that  X  > 
Gx(a)  whenever  there  is  no  probability  atom  at  Gj(a),  i.e. ,  P({a;  €  |  X(oj)  =  G'x(«)})  =  0. 

An  example  of  a  regular  measure  of  risk  that  is  also  positively  homogeneous  and  monotonic  is 
the  well-known  superquantile/CVaR  risk  measure,  defined  next,  which  we  here  label  “first-order”  to 
distinguish  it  from  the  second-order  extensions  of  Section  3. 

2.1  Definition  (first-order  superquantile  risk  measure)  For  a  given  a  €  [0, 1),  a  measure  of  risk  lZa  : 
C?  -»  (—00,00]  of  the  form 

na(X)  :=  Gx(a) 

is  called  a  first-order  superquantile  measure  of  risk. 

Obviously,  E[X]  <  Gx{a)  <  sup  A  and  Gx{ot)  >  E[X]  for  nonconstant  X  unless  a  =  0.  Moreover, 
from  [16,  Proposition  1]  we  also  know  that  for  a  <  1,  Gx(cn)  is  bounded  from  above  by  an  expression 
involving  the  standard  deviation 


a(X)  :=  (E[(X-E[X]f])1/2. 
Combining  these  facts,  we  can  state  the  following  results. 


2.2  Proposition  For  Ie£2  and  a  G  [0, 1), 


E[X]  <  G \  (a)  =  Ua(X)  <  min  <j  E[X]  +  -^2=,  sup  X  }  , 

V  1  —  a 


with  the  lower  hound  being  strict  for  nonconstant  X  unless  a  =  0. 

We  end  this  section  by  recalling  a  consequence  of  the  Fubini-Tonelli  Theorem,  which  soon  will  be 
put  to  use  in  Section  3,  and  adopt  the  following  notation.  For  a  set  S  with  a  topology,  let  Bs  be  its 
Borel  sigma-algebra.  We  denote  by  m  the  Lebesgue  measure  defined  on  Bs,  S  =  M  or  any  subset  of 
]R.  Let  1R  :=  1R U{— 00,00}.  Given  measurable  spaces  (X,A)  and  ( y,B ),  a  (A,  immeasurable  function 
/  :  X  — >•  y  is  simply  referred  to  as  A- measurable  when  y  is  topological  and  B  is  the  sigma-algebra  By. 


2.3  Proposition  Suppose  that  (A,  A,  p)  and  (y,  B,  u)  are  sigma-finite  measure  spaces.  If  f  :  X  x  T  — t 
IR  is  measurable  with  respect  to  the  product  sigma-algebra  on  X  x  y  and  g  :  X  x  y  — >•  M  is  integrable 
with  respect  to  the  product  measure  p  x  u,  with  f(x,  y )  >  g(x,  y )  for  (p  x  v)-a.e.  (x,  y)  G  X  x  y,  then 
the  following  hold: 


(i)  the  function  hi  =  f  f(x,  •)  dp(x)  is  B-measurable, 

(ii)  the  function  h-2  =  f  f(-,y )  du(y)  is  A-measurable, 


(in)  and 


j  f  d(p  x  u) 


f(x,y)  dp(x) 


dv{y)  = 


f{x,y)  dv(y) 


dp(x). 
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Proof.  We  recall  that  the  integral  of  the  sum  of  a  nonnegative  measurable  function  and  an  integrable 
function  equates  the  sum  of  the  individual  integrals  under  the  usual  rules  for  handling  addition  with 
infinity.  Then, 


hi  = 


f(x,-)dp(x) 


J  (/  -  9){x,-)dn{x)  + 


g(x,  -)dp(x) 


is  immeasurable  since  both  terms  on  the  right-hand  side  are  immeasurable  by  the  Fubini-Tonelli  The¬ 
orem.  A  similar  argument  yields  the  conclusion  for  h-2-  The  final  assertion  follows  by  applying  the 
Fubini-Tonelli  Theorem  to  /  —  g  and  g ,  and  the  above  rule  about  interchange  of  summation  and  inte¬ 
gration.  □ 


3  Mixed  and  Second-Order  Superquantile/CVaR  Risk 

We  start  with  a  parallel  to  (1)  and  define  the  second-order  superquantile  function  Gx  ■  [0, 1)  — >•  (— oo,  oo] 
as 

&x(oi)  :=  — ^ —  [  Gx((3)d/3,  a€[0,l).  (3) 

1-aJa 

Analogously  to  Definition  2.1,  this  function  generates  the  second-order  superquantile  risk  measures  as 
defined  next. 

3.1  Definition  (second-order  superquantile  risk  measure)  For  a  given  a  €  [0, 1),  a  measure  of  risk 
fZQ  :  C?  — >•  (— oo,  oo]  of  the  form 

Ka(X)  :=  Gx(a) 

is  called  a  second-order  superquantile  measure  of  risk. 

As  we  establish  shortly,  TZa  is  a  regular  measure  of  risk. 

A  motivation  for  considering  such  risk  measures  is  furnished  by  the  natural  extension  of  the  idea 
behind  passing  from  quantiles  to  first-order  superquantiles:  to  obtain  better  behaved  and  more  conser¬ 
vative  expressions  for  risk.  Specifically,  starting  from  a  random  variable  X  with  cumulative  distribution 
function  Fx,  the  transformation 

X  =  Gx(Fx(X)) 

constructs  a  new  random  variable  X  whose  quantiles  coincide  with  the  first-order  superquantiles  of  X, 
i.e. ,  Gx(a)  =  Gx(a)  for  all  a  €  (0, 1).  In  view  of  the  definition  of  first-order  superquantiles  in  (1),  we 
then  find  that 

7Za(X)  =  Gx(a)  =  — f1  Gx((3)d/3  =  — T  Gx((3)df3  =  Ka(X). 

1-  a  Ja  1  -  ol  Ja 

Clearly,  TZa  is  more  conservative  than  TZa  and  represents  further  smoothing  (averaging)  of  the  corre¬ 
sponding  quantile  function  beyond  what  is  already  achieved  by  a  first-order  superquantile.  Additional 
motivation  derives  from  the  fact  that  lZa  represents  particular  preferences  of  a  decision  maker  according 
to  dual  utility  theory  (see  [28,  12,  6,  24])  as  well  as  the  connections  with  superquantile  regression  [16] 
revealed  in  Section  4. 
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The  first-  and  second-order  superquantile  risk  measures  fit  into  a  larger  picture  of  mixed  superquan¬ 
tile  risk  measures  defined  in  terms  of  weighting  of  first-order  superquantiles  at  different  probability  levels. 
The  general  mixed  superquantile  risk  measures  are  also  of  importance  in  their  own  right  due  to  their 
coherency  and  close  connection  with  dual  utility  theory;  see  the  discussion  in  Section  1.  Specifically,  we 
let  A  be  a  probability  measure  on  ([0, 1) ,  i5[0,i) ) ,  representing  a  “weighting”  of  a  collection  of  first-order 
superquantiles  Gx(cx),  a  £  [0, 1).  We  refer  to  A  as  a  weighting  measure. 

3.2  Definition  (mixed  superquantile  risk  measure)  For  a  weighting  measure  A,  a  measure  of  risk 
IZ  :  C2  — >  (— oo,  oo]  of  the  form 

IZ(X)  ■=  f  Gx(ot)  d\(a)  (4) 

Jo 

is  called  a  mixed  superquantile  measure  of  risk ? . 

If  A  is  concentrated  on  a  finite  number  of  points  in  [0,1),  say  ctq,  02, ...,  «&,  then  simply  IZ(X)  = 
\{a{)G x{oti)  +  ...  +  A (ak)Gx(cxk)-  A  first-order  superquantile  risk  measure  is  realized  by  setting  k  =  1. 
The  second-order  superquantile  measure  of  risk  lZa  is  formed  by  the  weighting  measure  A  =  AQ,  with 
A a{S)  :=  m(S  fi  (a,  1))/ (1  —  a)  for  any  S  £  (Here,  m  is  the  Lebesgue  measure.)  In  general,  since 

A  is  defined  on  £>[0ji),  we  exclude  the  possibility  of  a  weighting  measure  that  places  a  positive  weight  at 
a  =  l  because  that  case  simply  yields  7Z(X)  =  00  when  sup  A  =  00,  which  is  better  treated  separately. 

For  technical  reasons,  we  exclusively  deal  with  the  completion  of  ([0, 1),  i5r0,i) ,  A),  which  we,  with  a 
slight  abuse  of  notation,  denote  by  ([0, 1),  Hr0ji),  A). 

We  are  then  ready  to  give  the  basic  properties  of  a  mixed  superquantile  risk  measure.  The  following 
result  is  a  slight  extension  of  [20,  21]  by  dealing  with  a  relaxed  condition  for  finiteness  and  the  point 
/ 3  =  0  explicitly.  Also,  parts  of  the  proof  are  new. 

3.3  Theorem  (mixed  superquantile  properties)  A  mixed  superquantile  risk  measure  IZ,  see  (4),  is 
well-defined,  monotonic  and  positively  homogeneous.  It  is  regular  if  A(|0|)  <  1,  but  lacking  averseness 
if  A({0})  =  1.  Specifically, 

IZ(X)  >  E[X]  for  all  X  £  C2  and  IZ(X)  >  E[X]  for  nonconstant  X  unless  A({0})  =  1. 

It  is  Unite  on  C2  whenever  the  weighting  measure  A  satisfies 

l!  vrys im  <  00 

and,  regardless  of  the  weighting  measure,  has  IZ{X)  <  00  whenever  sup  A  <  00. 

It  has  the  alternative  expression 

TZ{ A)  =  [  Gx{/3)v(/3)df3,  where  </?(/?)  :=  /  1  d\(a),  /3  €  [0, 1]. 

Jo  Jo<a</3  1  ~  a 

The  risk  profile  function  ip  is  right-continuous  and  nondecreasing  on  [0, 1]  with  <^(0)  =  0  and  satisfies 
fg(  1  —  a)d<p(a )  =  1.  Conversely,  any  <p  with  these  properties  arises  from  a  unique  weighting  measure 
A  given  by  dA(a)  =  (1  —  a)d(p(a). 

3  Also  called  a  spectral  measure  of  risk  [1]  and  Choquet  representation  of  distortion  acceptability  functionals  [12]. 
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Proof.  For  every  X  €  £2,  Gx  is  continuous  and  finite  on  [0, 1)  and  therefore  Z3r0  ^-measurable.  More¬ 
over,  Gx  fi  E[X\  and  therefore  TZ(X)  >  E[X ]  >  — oo.  Consequently,  1Z  is  well-defined  with  values  in 
[,E[X],oo].  Its  regularity  and  positive  homogeneity  follow  directly  from  those  of  lZa;  see  [19].  Since  Gx 
is  strictly  increasing  on  [0, 1)  for  nonconstant  X,  we  have  that  if  A({0})  <  1,  then 


K(X)  =  £[X]A({0})  +  [  Gx(fi)  d\(/3 )  >  £[X]A({0})  +  E[X](  1  -  A({0})  =  E[X] 

Jl>/3>0 


and  the  strict  lower  bound  follows.  From  Proposition  2.2, 


n{X)  <  J E[X]  +  -^=L=  dm  =  E[X\  +  a(X)  £  -J=L=  dm  <  oo 

under  the  stated  assumption,  which  establishes  the  corresponding  finiteness  on  Cfi .  In  the  case  of 
supX  <  oo,  hniteness  of  1Z(X)  follows  trivially. 

We  next  consider  the  alternative  expression.  By  definition, 


K(X)  = 


Gx{fi)ip{a,fi)dfi 


dX(a), 


(5) 


with  ip(a,  (5)  =  if0<a</3<l  and  ip(a,  /3)  =  0  otherwise.  We  equip  [0, 1)  x  (0, 1)  with  the  product 

measure  A  xm  defined  on  the  product  sigma-algebra  <B[o,i)  <8>  £*(o,i)  •  It  is  obvious  that  V’  :  [0, 1)  x  (0, 1)  — » 
1R  is  (jB[0,i)  ®  Bfo. i ) )-measurable  and  likewise  Gx,  viewed  as  a  function  on  [0, 1)  x  (0, 1)  that  is  constant 
in  its  first  argument,  due  its  monotonicity.  Consequently,  the  function  (a,j3)  i->-  Gx{fd)^{a,  f3)  is 
measurable  in  the  same  sense.  Then,  we  look  toward  the  interchange  of  integration  order  in  (5). 

We  consider  three  cases,  (i)  Suppose  that  X  >  0  a.e.  Then,  Gx  >  0  and  Gx'<p  >  0,  and  the 
interchange  of  integration  order  is  permitted  by  Tonelli-Fubini’s  Theorem,  (ii)  Suppose  that  X  <  0 
a.e.  Then,  —  Gx  >  0  and  —GxP>  >  0,  and  the  interchange  of  integration  order  is  again  permitted  by 
Tonelli-Fubini’s  Theorem,  (iii)  Suppose  that  neither  (i)  nor  (ii)  holds.  Then,  there  exists  a  fix  €  (0, 1) 
such  that  Gxifi)  >  0  for  (3  >  fix  and  Gx(fi)  <  0  for  fi  <  fix ■  In  view  of  Proposition  2.3,  it  suffices  to 
find  an  integrable,  lower-bounding  function  of  Gx'fi-  Let  g  :  [0, 1)  x  (0, 1)  — >  M  be  given  by 


g(a,fi)  = 


Gx{fi)/{  1  —  fix)  if  0  <  a  <  fi  <  fix 

Gx(fi)  ii  0  <  a  <  fi  <  1,  fix  <  fi 

0  otherwise. 


Clearly,  Gxip  fi  9  and 

[  \g\d( X  x  m)  < 


1 


\Gx\d(X  x  m)  = 


1 


\Gx(fi)\dfi 


dX(a), 


l -fix  J  ■  v  y  i  —  fix  Jo  [Jo 

where  the  equality  follows  by  Tonelli-Fubini’s  Theorem.  The  inner  integral  simplifies  to 
r  l  /•!  rfix  _  rPx 

\Gx(fi)\dfi=  Gx(fi)dfi-  Gx(fi)dfi  =  (l-fix)Gx(fix)-  Gx(fi)dfi. 

Jo  Jfix  Jo  Jo 


(6) 
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The  last  term  requires  further  simplification.  Recall  that  for  a  €  (0, 1), 


1  fa 

-  /  GxWP 
a  Jo 


1 

a 


G-X(/3)df3 


'  1—a 


—G-x(  1  —  a). 


Applying  this  result,  the  inner  integral  from  above  simplifies  further  to 

I Gx(P)\d/3  =  (1  —  Px)Gx{Px)  +  PxG-x(  1  —  Px)  <  oo. 

Consequently  in  view  of  (6),  g  is  integrable  and  therefore  furnishes  the  necessary  lower-bounding,  inte¬ 
grate  function  in  Proposition  2.3,  which  completes  part  (iii).  We  are  therefore  permitted  to  interchange 
the  order  of  integration  in  (5)  and  get 


K(X)  = 


i  o  Vo 


Gx(/3)J>(a,  /3)d(3 


d\(a)  =  /  Gx(fi) 


IJo 


ip(a,  (3)  d\(a) 


d/3  =  /  Gx((3M/3)d/3 , 


where  the  last  equality  follows  from  the  definition  of  (p. 

The  final  assertions  follow  from  recognizing  that  the  Lebesgue-Stieltjes  measure  dip  associated  with 
a  function  p  has  dp(a)  =  f  _^;y  d\(a)  for  a  weighting  measure  A  on  [0, 1).  □ 

Second-order  superquantiles  possess  the  following  properties. 


3.4  Theorem  (second-order  superquantile  properties)  Any  second-order  superquantile  risk  measure 
1Za  :  C2  — *  JR,  a  €  [0, 1),  is  regular,  monotonic,  and  positively  homogenous,  and  satisfies  for  X  G  C2 


E[X]  <  Gx{a)  =  TZa(X)  <  min  E[X]  + 


2er(X) 
y/1  —  a 


sup  X 


with  the  lower  hound  holding  with  strict  inequality  whenever  X  is  nonconstant. 

It  has  the  alternative  expressions 

Ka(X)  =  1  1  f1  Gx(/3)  log  \^d/3  =  f1  Gx(/3)pa(l3)dp , 
!-«ia  1  -  P  Jo 

with  respect  to  the  risk  profile  function 


Va(/3)  ■= 


I^logyEf  ifa</3<  1 
0  if  0  <  (3  <  a. 


Moreover,  pa  is  a  nondecreasing,  finite  convex  function  on  [0, 1]  with  right-derivative  equal  to  1/(1  — a)2 
as  it  starts  to  grow  from  0  a t  (3  =  a. 

Proof.  As  a  special  case  of  Theorem  3.3,  it  follows  automatically  that  lZa  is  well-defined,  regular, 
monotonic,  positively  homogeneous,  and  bounded  from  below  by  E[X].  From  Proposition  2.2, 


na(x)  < 


1  —  a 


rlE[X]  +  ^B=dp  =  E[X}+(7iX)  f 1 


V/W 


1  -  a  Ja  V1  -  P 


d\((3)  =  E[X) 


2 ojX) 

y/l  —  a 


Obviously,  1Za(X)  <  supX  also  holds. 

The  alternative  expression  follows  after  a  specialization  of  ip  of  Theorem  3.3  for  the  given  choice  of 
weighting  measure  A  =  Xa.  Specifically, 


<P(P)  =  [  7)  =  <pa(P) 

J  0<7</J  1  —  7 


fd  yA  —A  dy  if  a  <  P  <  1 

Ja  1— 7  1—a  1  ~ 

0  if  0  <  0  <  a. 


Since  for  0  <  a  <  b  <  1, 


1 


dp 


log 


1  —  a 
1  —  6  ’ 


we  therefore  find  that  the  alternative  expressions  follow. 

The  assertion  about  (pa  being  convex  is  justified  by  its  derivative  being  zero  for  p  €  (0,  a)  and 
1/((1  —  a)(  1  —  P))  for  p  €  (a,  1),  with  left-  and  right-derivatives  at  p  =  a  equal  to  0  and  1/(1  —  a)2, 
respectively.  □ 


The  upper  bounds  on  lZa  and  TZa  in  Proposition  2.2  and  Theorem  3.4,  respectively,  are  remarkably 
similar,  and  show  that  although  second-order  superquantile  risks  are  larger  than  first-order  risks,  the 
difference  is  at  most  a(X)/\/l  —  a. 


4  Duality  for  Mixed  Superquantile/CVaR  Risk  Measures 

We  next  turn  to  the  derivation  of  dual  expressions  for  mixed  and  second-order  superquantile  risk 
measures.  We  recall  the  dual  relationship  (see  for  example  [19])  between  a  nonempty  closed  convex  set 
Q  C  £2,  called  a  risk  envelope,  and  a  positively  homogeneous,  regular  risk  measure  1Z  through 

K{X)  =  sup  E[XQ]  for  Ie£2,  Q  =  {Q  e  C2  \  E[XQ }  <  K{X)  for  all  X  €  £2}. 

QeQ 

An  essential  building  block  for  such  expressions  in  the  case  of  mixed  superquantile  risk  measures  is  the 
dual  expression  for  first-order  superquantile  risk  measures,  which  we  review  first. 

For  a  €  [0, 1),  we  recall  that  a  first-order  superquantile  risk  measure  (see  [20,  19])  has 

na(X)  =  sup  E[XQ\, 

QeQa 

where  the  risk  envelope  is 

Qa  '■=  {Q  £  C?  |  0  <  Q(u)  <  1/(1  —  a)  a.e.  u  €  1A,  E[Q\  =  1}. 

We  also  need  the  following  definitions  and  technical  results. 


4.1  Definition  Let  (T,  A,  p)  be  a  complete  measure  space,  with  p  sigma-finite,  X  a  separable  reflexive 
Banach  space,  and  M  a  linear  subspace  of  the  linear  space  of  all  (A,  Bx) -measurable  functions  x  : 
T  — >•  X.  The  set  M.  is  (A,  £>,v)-decomposable  if,  whenever  x  e  JA  and  xq  :  S  — >•  X  is  a  bounded 
(A,  Bx) -measurable  function  on  a  set  S  €  A,  with  p(S)  <  00,  then  the  function  y  :  T  — >•  X  given  by 


y(t) 


xq  (t)  if  t  e  S 
x(t)  if  t  e  T  \  S 


also  belongs  to  JA . 
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4.2  Definition  In  the  notation  of  Definition  4.1,  we  say  that  a  function  f  :  T  x  X  — >•  (—00,00]  is  a 
normal  integrand  if  the  following  hold: 

(i)  f  is  (A  <g>  Bx) -measurable  and 

(ii)  for  every  t  €  T,  f(t ,  •)  is  lower  semicontinuous  on  X  and  not  identical  to  00. 

4.3  Proposition  Suppose  that  the  conditions  and  notation  of  Definition  4.1  hold  and  f  :  T  x  X  — >• 
(—00, 00]  is  a  normal  integrand.  Then,  the  following  hold: 

(i)  the  functions  t  >-)•  inf f(t,0  and  t  >-)•  f(t,x[t)),  with  x  :  T  — >•  X  (A,  Bx) -measurable,  are 
A-measurable  and 

(ii)  if  M.  is  (A,  Bx) -decomposable  and  there  exists  an  x  €  M.  such  that  f  f(t,x(t )  dp(t)  <  00,  then 

inf  l  d M*)  =  [  ?(*)  dh(t),  where  cp(t)  =  inf  /(£,£)•  (7) 

xeM  J  J  i&x 

Proof.  First,  we  consider  t  >-)•  inf £ex  For  measurable  spaces  (X\,A\)  and  (X2,A-2),  we  recall 

that  a  set-valued  mapping  S  :  X\  A  X2  is  (A\,  ^42)-measurable  if  its  graph  is  measurable  in  the  sense 
that 

{(xi,x2)  €  Xi  X  X2  |  X2  G  S(x  1)}  €  Ai  (8>  A2, 

where  A\®A2  is  the  product  sigma-algebra  generated  by  A\  and  A2.  Since  /  is  a  normal  integrand,  the 
set-valued  mapping  1 1->-  epi/(t,  •)  is  .A-measurable  and  closed- valued;  see  for  example  [22,  Proposition 
1].  By  [22,  Theorem  1(f)],  there  exists  a  countable  collection  {gi}i£i  of  A-measurable  functions  (ji  : 
T  ^  X  xRof  the  form  gi(t)  =  (xi(t),  a^t)),  Xi(t )  €  X  and  ai(t)  €  R,  such  that 

epi/(t,  •)  =  cl{£r,:(t)}ie/  for  all  t  €  T, 

where  cl  denotes  closure.  The  mapping  t  *->•  a*(t)  is  also  A-measurable.  Consequently, 

inf  f(t,  f()  =  inf  Oj(t)  for  all  t  €  T 
£(zX  iei 

and  the  conclusion  follows  from  the  fact  that  the  pointwise  infimum  of  a  countable  collection  of  mea¬ 
surable  functions  is  a  measurable  function. 

Second,  we  consider  t  f(t,x(t)),  which  is  a  composition  of  /  with  the  measurable  mapping 
1 1->-  (t,x(t))  and  therefore  measurable. 

Third,  we  establish  part  (ii)  by  following  the  arguments  in  the  proof  of  Theorem  2  in  [22].  By 
assumption  there  exists  a  function  x\  €  M.  and  a  p-integrable  function  a\  :  T  — *  R  such  that 

f(t,x\(t))  <  Q!i(f)  for  every  t  G  T. 

Since  ip(t)  <  f(t,x(t))  for  every  function  x  €  A4  and  t  £  T  by  definition  and  (p  is  A-measurable  by 
part  (i),  the  integral  of  (p  is  well-defined  and  either  finite  or  equals  —00.  Consequently,  the  inequality 
>  holds  in  (7).  Now,  let  7  €  R  be  such  that 

J  (p(t)  dp(t)  <  7.  (8) 
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We  will  prove  the  existence  of  a  function  isM  such  that 

J  f(t,x(t))  dn(t)  <  7,  (9) 

thereby  establishing  part  (ii).  From  (8)  and  the  properties  of  ( T,A,g ),  there  exists  a  /j-integrable 
function  ao  :  T  ^  1R  such  that  <p(t)  <  ao  (t)  for  every  t  6  T  and 

J  a0(t )  dn(t)  <  7.  (10) 

We  define  the  set- valued  mapping  S  :  T  X  by 

S{t)  =  {£  €  X  |  /(t,£)  <  a0(t)}  for  t  €  T. 

Since  the  function  (t,  £)  i->-  f(t ,  £)  —  ao(t)  is  (A  (g>  £?v)-measurable,  S  is  also  M-measurable.  Moreover, 
S(t)  is  for  each  t  G  T  closed  and  nonempty.  Since  S  is  M-measurable,  there  exists  a  A- measurable 
selection  xo,  i.e. ,  a  M-measurable  function  xo  such  that  xq (t)  €  S(t )  for  every  i  G  T;  see  for  example  the 
corollary  of  Theorem  1  in  [22].  Since  (10)  holds,  there  exists  a  measurable  set  To  C  T,  with  //(To)  <  oo, 
such  that 

/  a0(f)  c?//(t)  +  /  ai(t)  dfi(t)  <  j.  (11) 

■jTq  Jt\T0 

By  the  construction  of  S  in  terms  of  ao,  the  measurable  selection  xo  can  be  chosen  to  be  bounded  on 
To.  Let  x  :  T  ->  X  be  such  that  x(t)  =  xq (t)  for  t  G  To  and  x(t)  =  xi(f)  for  t  €  T\Tq.  Then,  x  €  .Ad  by 
the  assumption  of  decomposability,  and  we  have  that  f(t,x(t))  <  ao (t)  for  t  €  To  and  f(t,x(t))  <  ai(f) 
for  t  €  T  \  Tq.  From  (11)  we  then  conclude  (9),  which  establishes  part  (ii).  □ 


4.4  Lemma  If  q  :  [0, 1)  — >•  £2  is  (Z3r0ji), B £2) -measurable,  then 


(i)  the  function  fi  :  [0, 1)  x  -»  IR  given  by  fi(/3,uj)  =  q(/3)(uj)  is  (Br 0.1 )  <8>  J7)- measurable ,  and 


(ii)  the  function  fi  :  [0, 1 )  —>  M  given  by  f 2,(13)  =  || <?(/5)  || 2  is  Br0  ^-measurable. 


Proof.  For  part  (i)  simply  observe  that  f\  =  g  o  h,  where  h  :  [0, 1)  X  — >•  C?  X  f l,  with  h(a,uj)  = 
( q(a),ui ),  and  g  :  C2  x  f2  — >  1R,  with  g(Q,  u)  =  Q(u>).  The  conclusion  then  follows  from  the  measurability 
of  q  and  elements  of  C2,  and  the  fact  that  composition  of  measurable  functions  is  measurable.  Next, 
we  consider  part  (ii).  A  trivial  extension  of  part  (i)  establishes  that  the  function  (/3 ,ui)  1-4  [q(f3)(ix)]2 
is  (i3[o,i)  <8>  J7) -measurable.  Since  it  is  also  nonnegative,  it  follows  from  Tonelli-Fubini’s  Theorem  that 
[f'2(-)]2  is  ,8[o, ^-measurable.  □ 


In  preparation  for  returning  to  our  application,  we  define  a  specific  class  of  integrable  mappings. 


Let 


M  :=  <q  :  [0, 1)  C2 


Q  (B[0li),  Be 2)  -measurable, 


2  d\(P)  <  00 


We  note  that  M.  is  well-defined  because  by  Lemma  4.4,  the  mapping  (3  i->-  ||(/(/3)||2  is  jB[o.  immeasurable 
whenever  q  is  (jBr0)i), £?£2  )-measurable. 


4.5  Proposition  The  set  A4  is  (6r0)i), B £2) -decomposable. 
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Proof.  This  fact  is  a  direct  consequence  of  Definition  4.1.  □ 

We  are  now  ready  to  return  to  the  risk  envelope  of  a  mixed  superquantile  risk  measure  1Z  and  define 
a  collection  of  random  variables  in  terms  of  (Bochner)  integrals  of  elements  of  M.  Let 


Q:=  cl 


QeC2 


Q 


dX(/3),q  G  M,  q(/3)  G  Qp  for  A-a.e.  /3  €  [0, 


where  cl  denotes  closure  with  respect  to  the  (strong)  topology  on  C2 .  We  note  that  Q  resembles  the 
Aumann  integral  (see  for  example  [3])  of  the  set- valued  mapping  /3  i— )•  Qp. 


4.6  Theorem  (risk  envelope  for  mixed  superquantile)  The  set  Q  C  £2  is  nonempty,  convex,  and  is  the 
risk  envelope  of  7Z,  i.e.,  for  any  X  G  C2 , 


K{X)  =  sup  E[XQ\. 

QeQ 

Moreover,  whenever  f  l/\Jl  —  a  dX(a)  <  oo,  it  is  also  weakly  compact. 

Proof.  Let  X  €  C2  and  /  :  [0, 1)  x  C2  — >•  1?  be  defined  by 

j-E[XQ\  if  Q€Qq 

f(a,Q)  =  < 

I  oo  otherwise. 

In  view  of  Definition  4.2,  /  is  a  normal  integrand  because  (i)  /  is  (-S[o,i)  <8>  Be2) -measurable  as  the  sum 
of  the  continuous4  function  —  E[X-\  on  [0, 1)  X  C2  and  an  indicator  function  vanishing  on  the  set 

{(/3,  Q )  G  [0, 1)  x  C2  |  Q  €  Qg}  €  B[0,i)  <8>  &c2 

and  infinity  elsewhere,  (ii)  f(f3,Q)  >  —E[XQ\  >  —00  for  (3  G  [0,1)  and  Q  G  C2 ,  and  (iii)  for  all 
P  €  [0,1),  is  lower  semicontinuous  by  the  continuity  of  E[X-]  on  C2  and  the  closedness  of 

Qp  C  C2,  and  /(/ 3,  •)  is  not  identical  to  00  with  Q  =  1  G  Qp  furnishing  a  finite  value  /(/3, 1)  =  —E[X\. 
In  view  of  Proposition  4.5  and  the  fact  that  q  =  1  provides  an  element  of  M  with  f  f({3,  q((3))  d\(/3)  = 
—E[X]  <  00,  Proposition  4.3  applies.  Consequently,  the  interchange  of  integration  and  minimization  is 
permitted  and  we  obtain  that 


n(X)  =  [  sup  E[XQp\  d\((3)  =  -  [  inf  f(/3,Q)  d\(/3) 

J  QpeQp  J  <3e£2 

=  -  inf  [  f{P,q(P))  dX(f3). 

q&M  J 

We  next  consider  the  interchange  of  integration  with  respect  to  A  and  P.  For  q  G  A4,  it  follows  from 
Lemma  4.4  that  the  function  (/ 3,oj )  1— )•  | X(u)q(f3)(tjo)\  is  measurable.  By  Tonelli-Fubini’s  Theorem  and 
Cauchy-Schwartz  inequality, 

J  \X(u)q(f3)(uj)\d(X  x  F)(/3,u>)  =  f  E[\Xq(P)\]  dXtf)  <  \\X\\2  J  \\q(P)\\2  dX{P)  <  00, 

4Hcre  continuity  is  with  respect  to  the  product  topology  of  the  norm-topologies  on  [0, 1)  and  C2 . 
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where  the  finiteness  follows  by  the  property  of  q  G  M.  Then  by  Tonelli-Fubini’s  Theorem, 

j  E[Xq{(3 )]  dX(fd)  =E  Xj  q(/3 )  dX(/3)  . 

Since 

j  dm  =  J  E[xq(p)}  dm 

whenever  q  G  M.  is  such  that  q((3)  G  Qp  for  A-a.e.  /3  G  [0, 1)  and  f  f(/3,  q{j3 ))  dX{j3)  =  oo  otherwise,  we 
find  that 


n(X)  =  -  inf  f  f(/3,q(/3))  dX(/3)  =  sup  { E  X  f  q((3)  dX(/3 )  -  t(q)\  =  sup  E[XQ}. 

QG MJ  qeM  {  l  J  J  J  Qeg 

The  convexity  of  Q  follows  from  the  convexity  of  Qp.  Since  1  £  Q,  Q  is  not  empty.  Under  the 
additional  assumption  that  f  l/Vl  —  a  dX(a )  <  oo,  TZ  is  finite- valued  on  C?  and  even  locally  bounded 
around  the  origin  of  £2  by  Theorem  3.3.  This  local  boundedness  for  a  positively  homogeneous  convex 
function,  as  the  support  function  of  a  set  Q,  corresponds  to  that  set  being  bounded.  Consequently, 
Q  is  bounded.  Since  Q  is  convex,  weak  closedness  follows  from  strong  closedness  and  therefore  weak 
compactness  is  established.  □ 

For  the  special  case  of  a  second-order  superquantile  risk  measure  we  then  obtain  the  following 
corollary. 

4.7  Corollary  For  a  €  [0, 1),  the  risk  envelope  of1Za  is  given  by 

Qa  ■=  cl  G  C2  Q  =  -  J  q(/3)d(3,  q  €  M ,  q(/3 )  G  Qp  for  m- a.e.  (3  €  [a,  1) 

Moreover,  Qa  is  a  nonempty  weakly-compact  convex  subset  of  C? .  □ 

In  addition  to  the  trivial  cases  when  A  and/or  P  are  positive  only  on  a  finite  number  of  points  in 
[0, 1)  and  fl,  respectively,  the  closure  in  the  definition  of  Q  (and  Qa )  is  unnecessary  under  the  following 
condition. 

4.8  Proposition  Suppose  that  X  is  nonatomic  and  f  1/(1  —  a)  dX(a)  <  oo.  Then, 

Q  =  |q  G  C2  Q  =  j  q(/3 )  dX(/3 ),  q  G  M ,  q(/3)  G  Qp  for  A-a.e.  /3  G  [0, 1) 
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Proof.  By  [4],  an  integrably  bounded  £>[0  ^-measurable  set- valued  mapping  5  :  [0, 1)  =4  £2,  with  closed 
and  convex  values,  satisfies 

5(a)  dA(a)|  =  j  5(a)  dX(a) 

when  A  is  nonatomic.  Take  5  to  be  the  mapping  a  i-G  {q{ot)  \  q  G  M.,q(a)  €  Qa},  which  obviously  is 
closed  and  convex  valued  by  the  properties  of  Qa.  Moreover,  since  both  [0, 1)  and  C2  are  separable, 
there  exists  a  countable  collection  {q,l}|^1,  ql  €  At,  such  that  5(a)  =  cl{g*(a)  |  i  =  1,2,...}  for  A-a.e. 
a  G  [0, 1).  Thus,  5  is  Br0, ^-measurable;  see  for  example  [22,  Theorem  1],  The  mapping  5  is  integrably 
bounded  if  there  exists  a  5[o, immeasurable  g  :  [0, 1)  — >  1R  with  f  g(a )  dX(a)  <  oo  and 

sup  HQH2  <  g(a )  for  A-a.e.  a  G  [0, 1). 

QeS(a) 

Since  for  our  choice  of  5  we  have  that  every  Q  G  5(a)  has  Q(u)  <  1/(1  —  a)  for  a.e.  u  G  fl,  integrably 
boundedness  holds  with  g(a)  =  1/(1  —  a)  under  the  imposed  restriction  on  A.  □ 

Next,  we  turn  to  specific  expressions  for  risk  identifiers.  Recall  that  for  any  iGf2  and  positively 
homogeneous  regular  measure  of  risk  on  £2,  a  Q  in  the  risk  envelope  of  the  risk  measure  that  maxi¬ 
mizes  E[XQ\  is  called  a  risk  identifier  at  X.  We  again  start  with  the  building  blocks  from  first-order 
superquantile  risk  measures. 

For  X  G  £2,  the  set 

Qa  :=  argma xE[XQ\ 

Q&Qoc 

is  convex  and  nonempty  with  its  elements  referred  to  as  risk  identifiers  of  lZa.  Before  we  characterize 
these  risk  identifiers,  we  introduce  additional  notation. 

For  (3  G  (0, 1),  let 

Qg(X)  :=  {cu  G  O  I  X{oj)  =  Gxm 

and  let 

Ffi-(x)  :=  lim  Fx(x '),  x  G  JR 

x'  Ex 

be  the  left-continuous  “companion”  of  the  cumulative  distribution  function  Fx,  where  the  limit  exists 
by  the  virtue  of  F\  being  nondecreasing  and  bounded  from  above.  For  F\  continuous,  Fx  =  Ffi  of 
course. 

The  risk  identifiers  of  lZa  are  then  characterized  as  follows;  see  also  [25,  Equation  4.21]  for  closely 
related  expressions. 

4.9  Proposition  For  Ig£2  and  (3  G  (0, 1),  let  r*  G  C2  he  such  that 

0  <  rf( w)  <  —  —  ~  for  a.e.  ui  G  fl  and  f  rf  (ui)dF(uj)  =  — (12) 
1  -  P  Jnp(x)  1  ~  P 
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Every  such  rp  ,  defines  a  unique b  Qp’  p  G  C2  given  for  a.e.  w  E  fi  by 


-Y,r* 


X  rA' 

Q*  ’ "  M 


ifX(uj)>Gx(P) 

<  r-p(u)  if  X(uj)  =  Gx{(3)  and  P({w})  >  0 
0  otherwise. 


(13) 


Then, 


Qp  = 


QeC2 


X.  rx 

Q  =  Q p  ’  p  for  some  rp  G  C?  satisfying  (12) 


Moreover, 

Qq  =  {Q  G  C2  |  Q(uj)  =  1  for  a.e.  uj  G  11}. 


Proof.  Let  /3  G  (0, 1)  and  X  G  C2 .  We  first  show  that  there  exists  an  rp  G  £2  satisfying  (12).  For 
uj  €  12  satisfying  X(u>)  =  Gx(P)  and  P({cn})  >  0,  F^(X(uj))  <  (3  <  Fx(X(u>)),  with  at  least  one  of  the 
inequalities  being  strict,  and 


Fx(X(uj))-(3 


(1  -P)(Fx{X(u,))-Fj(X(u>))) 
Let  f-p  G  £2  be  defined  for  a.e.  uj  G  fl  by 


G[0, 1/(1 -/?)]. 


Fx{X{u>))-P 


,  if  X(uj)  =  Gx(/3)  and  P({cn})  >  0 


f*(w)  :=  J  (i-/3)(fx(x(u,))-f-(xH)) 

(O  otherwise. 

Clearly,  r%  satisfies  0  <  f-p(uj)  <  1/(1  —  /3)  for  a.e.  wGfl.  Moreover, 


(14) 


'np(x) 


fp  (w)cflP(w)  = 


f  FxiGxm-P 

%(X)  (1  -  P)(Fx(Gxm  -  Fx(Gx(m 


dP(w)  = 


Fx(Gx(P))-P 
1  ~P 


and  f-p  therefore  satisfies  (12). 


Let  r-p  G  C2  satisfy  (12).  Since  0  <  Qp  p  (uj)  <  1/(1  —  P)  for  a.e.  uj  G  fl  and 

1 


Q*’r*  (uj)cIP(uj)  = 


dF(u  >+/  d  (w)dP(w) 


J hen  |  x(u)>Gx(P)}  1  “  P 

1  -Fx(Gx(P))  ,  Fx(Gx(P))-P 


1-/3 


+ 


1-/3 


£ip(x) 
=  1, 


5With  C2  consisting  of  equivalence  classes  of  functions  identical  up  to  on  a  set  of  P-measure  zero,  uniqueness  of  course 
is  in  the  sense  of  such  equivalence  classes. 
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X,r£ 


we  find  that  Qg  p  €  Qp.  Moreover, 

X  rx 


E 


XQ 


P 


X{u) 


dP(w)+  /  X{u)rp(u)dP(w) 
{wen  |  x(w)>Gx(f ?)}  1  “  P  J^p(x) 


M 


{wen  |  x(w )>Gx{p)} 

'Pi 


X(u)dP(u)  +  Gx(P) 


FX{GXm-P 

1-/3 


xdF^(x) 


in  the  notation  of  (2)  and  therefore  coincides  with  the  alternative  expression  for  Gx(/3),  which  proves 
X  rx  X  rx 

that  Qp  13  maximizes  E[X-]  over  Qp.  Any  Q  €  Qp  not  equal  to  Qg '  13  for  any  r g  must  necessarily 

have  E[XQ]  <  GX(P). 

The  case  of  ft  =  0  follows  also  as  then  Qo  =  {Q  £  C?  |  0  <  Q(w)  <  1  for  a.e.  oj  €  Q,  E[Q ]  =  1}.  □ 
A  particular  element  of  Qg  plays  a  central  role  in  the  following.  Let  r -g  €  C2  be  as  defined  in  (14). 
Consequently  by  Proposition  4.9,  Q *  defined  for  a.e.  ix  €  14  by 


Q] 3  M  := 


=  < 


l 

l- 

iX 


if  X(x)>Gx(P) 
rp( oS)  if  X(u)  =  Gx((3)  and  P({w})  >  0 


(15) 


0 


otherwise 


is  a  point  in  Qg  .  Moreover,  let  Qq  €  C2  be  defined  by  Qq(oj)  =  1  for  a.e.  uj  €  Q,  which  therefore  by 
Proposition  4.9  is  a  point  in  Qq  ■  The  random  variable  Q-g  behaves  continuously  in  (3  in  a  sense  given 
next. 


4.10  Proposition  If  /3u,f3  €  [0, 1)  and  f3u  — »  /3,  then  for  any  X  G  C2 ,  \\Qp„  —  Q* || 2  — >  0. 

Proof.  Let  X  E  C2  and  fg  be  defined  in  (14)  and  f3  €  (0, 1).  Suppose  that  Fx(Gx(P))  —  Fx(Gx(P))  > 
0.  We  consider  two  cases. 

First,  suppose  that  f3v  — v  (3,  with  (3U  <  (3  for  all  v.  which  implies  that  (3  G  [F^  (G x {(3)) ,  Fx (G x (/3))\ ■ 
If  f3  G  (F^(GX((3)),  Fx(Gx(f3))\,  then  Gx((3l/)  =  Gx(f3)  for  sufficiently  large  v.  Consequently,  for 
sufficiently  large  v. 


Qf  111  = 

+ 


(0  -  0)2dF(uj) 
-f${u))2dF(u)  + 


{w|  x{w)<Gxm 

'.x  (.  ,\  ~Xt,,\\ 2 

'tofi{X) 


'{w|X(w)>Gx(/3)}  V1  -  /3 


1 


2 

dP(u). 


When  X(u)  =  GxiP")  =  Gx(p), 


Fx(Gx(p))-p " _ FX(GX(P))-P 

(1  -  P-)(FX(GX(P))  -  Fx(Gx(/3)))  (1  -  (3)(FX(GX((3 ))  -  F^(GX{P))) 
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Hence,  all  three  terms  in  the  above  integral  vanish  as  v  —>■  oo.  If  f3  =  Fx  (GX(P)),  then  we  only  have  that 
Gx{Pv)  FGX(P)  by  the  left-continuity  of  Gx  and  in  fact  Gx(Pu)  <  Gx(P))  for  all  u.  Consequently, 


\\Q^-Qp\\22  = 

+ 

+ 

+ 


(0  -  0)2dP(w) 


'H  xh<gx(^)} 


'H  Gx(Pv)=X{u)<Gx(P)} 


r^v{u)  —  0)2dP(w) 


1 


' {Lo\Gx{^)<X(u)=Gxm  V1  ~  Pl 

1 


-rf(co] 


1 


UJ 


dP(w). 


! {lo\Gx{^)<Gx{0)<X(lo)}  V  1  -  PV  1  ~  P, 

Of  the  four  integrals,  the  first  and  fourth  ones  obviously  tend  to  zero.  For  the  second  one,  we  see  that 


F({u\Gx(Pv)  <  X(u>)  =  GX{P)})  =  Fx(Gx(n)  ~  F^(GX(J3V))  <  FX{GX{P))  ~  F^{GX{PV))  ->•  0 

by  the  left-continuity  of  F^  and  consequently  the  integral  also  tends  to  zero.  For  the  third  integral,  we 
find  that  when  X(u)  =  Gx(P) 


w  = 


Fx(Gxm-P 


FX{G x{P))  —  FZ{G X{P)) 


1 


(1  -  P){Fx{Gx{P))  -  F-(Gx(P)) )  (1  -  p)(Fx(Gx(P))  -  FZ(GX(P)))  1  "  P 


Consequently,  the  third  integral  also  tends  to  zero. 

Second,  suppose  that  Pu  — >  p,  with  pv  >  ft  for  all  v.  If  p  £  [F^(Gx(P)),  FX(GX(P))),  then 
Gx{Pv)  =  Gx{P)  for  sufficiently  large  v  and  the  corresponding  argument  for  the  first  case  still  holds. 
If  P  =  Fx{Gx(P)),  then  we  only  have  that  Gx(Pu)  >  Gx{P))  for  all  u.  Consequently, 


\\Qf-Qp\\2  = 
+ 

+ 

+ 


(0  -  0)2dP(w) 


'  {w\X(<jj)<Gx  (/3)} 

f 

'{u\Gx(J3)=X(w)<Gx(l3‘')} 

f 

'{oj\Gx(P)<X(oj)=Gx(^)} 


(0  -  rf  (oj)YdF(oj) 

1  \2 

rpYu)  -  YGTp)  dP 


<flP(w) 


'  {w\Gx  (P)<Gx  {PU)<X (u>)}  V  1  -  PV  1  ~  P, 

The  first  and  fourth  integrals  obviously  tend  to  zero.  For  the  second  one, 

Fx{Gx{P ))  -  P  Fx(Gx(P ))  -  Fx{Gx{P)) 


r?  M  (1  -  P)(Fx(Gx(P))  -  F~(Gx(P)))  (1  -  P)(Fx(Gx(P))  -  F~{Gx{P))) 

and  consequently  a  zero  integral.  For  the  third  integral, 

FX(GX(PV))  —  Pv  1 


=  0 


= 


(1  -  P-)(FX(GX(P »))  -  Fx(Gxm))  1  -  P 
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if  Gx{fiu)  remains  bounded  away  from  Gx(P)  because  then  Fx((3l/)  — >  Fx(/3 )  =  (3.  If  Gx((3L')  — >■ 
Gx(P),  then  by  the  right-continuity  of  Fx  we  have  that 

P({u;  €  Q\  Gx(/3)  <  X(u)  =  GxiH})  =  Fx(Gx(P,'))-Fx(Gx(J3'/))  <  Fx(Gx(n)-FX(Gx((3))  ->•  0. 


Consequently,  the  third  integral  also  tends  to  zero. 

The  situation  with  Fx(Gx(f3))  —  Fx(Gx(/3))  =  0  follows  with  similar  and  in  fact  simplified  argu¬ 
ments  as  in  that  case  Fx  is  continuous  at  Gx(/3)  and  Gx  is  continuous  at  (3. 

Finally,  we  consider  the  case  with  f3  =  0  and  j3u\0.  Then, 


II Qpv  -  Qolli  = 


I{lo\X(u)>Gx{F')}  V1  -  P 

f  (r>(w)  -  l)2dPM  + 

lcipv(x) 


- 1  rfP(w) 


'H  X(u)<Gx(P1')} 


(0-  l)2dP(w). 


Since  1/(1  —  /3U)  — >•  1,  the  first  integral  vanishes.  The  last  two  integrals  vanish  since  their  integrands 
are  bounded  and  Fx(Gx(/3u))  — >  0.  □ 

We  are  then  in  a  position  to  characterize  risk  identifiers  of  mixed  superquantile  risk  measures.  For 
X  €  £2,  let 


Qx  :=cUQe£ 


Q  =  j  q(f3)  dX(fi),  q  €  M,  q(/3)  €  QX  for  A-a.e.  /3  €  [0, 1) 


(16) 


4.11  Theorem  (risk  identifiers  for  mixed  superquantiles)  For  X  £  C2,  the  set  Qx  is  convex  and 
satisfies  the  following. 


(i)  If  Q  £  Qx ,  then  Q  is  a  risk  identifier  oflZatX. 

(ii)  If  f  1/y/l  —  (3  d\(/3 )  <  oo,  then  Qx  is  nonempty  and  weakly  compact,  and  Q  G  Qx  whenever  Q 
is  a  risk  identifier  of  IZ  at  X.  Moreover,  Q  :=  f  q(/3)  dX(/3),  where 

q  :  [0, 1)  — >  C2,  with  q(/3)  =  Qx  (defined  in  (15))  for  all  €  [0, 1), 

is  furnishing  an  element  of  Qx . 


Proof.  We  first  consider  (i).  Let  Q  £  Qx.  There  exists  sequences  {Qu}fk i  C  C2  and  {qu}((f=l  C  M 
such  that  || Qv  —  Q ||2  — >•  0,  Qv  =  f  qu(/3)  dX(fi),  and  qu(/3)  £  Qx  for  all  v  and  A-a.e.  /3  €  [0, 1).  Then, 
for  every  v , 


nx)  = 


E[Xq"m  dX(P)  =  E 


X 


qv{d)  dX((I) 


E[XQT], 


where  the  middle  equality  follows  by  the  argument  as  in  the  proof  of  Theorem  4.6.  Since  by  the 
Cauchy- Schwartz  inequality  E[XQU]  — >•  E[XQ\,  we  also  have  that  1Z(X)  =  E[XQ\,  which  establishes 

(i)- 

Next,  we  consider  (ii).  Suppose  that  f  1/yT  —  /3  dX(/3 )  <  oo.  We  proceed  toward  a  contradiction. 
Suppose  that  Q  £  Q  is  a  risk  identifier  of  IZ  at  X,  but  Q  0  Qx .  Then  there  must  exists  a  q  £  M  and 
B  £  such  that  q(f3)  £  Q.p  for  A-a.e.  (3  £  [0, 1),  X(B)  >  0,  and  q((3)  0  Qx  for  all  (3  £  B.  However, 


18 


this  implies  that  E[Xq(/3)\  <  E[XQ  x]  for  all  0  G  B  and  any  Q x  G  Qx .  Consequently,  E[XQ\  <  7 Z(X), 
which  is  a  contradiction. 

Since  Q  is  weakly  compact  by  Theorem  4.6,  the  weak  compactness  of  Qx  follows  from  it  being  a 
closed  convex  subset  of  Q.  Finally,  we  show  that  Q  G  Qx .  The  conclusion  follows  when  we  have  shown 
that  q  G  Xi.  By  Proposition  4.10,  q  is  continuous  and  therefore  (&0ji), £?£2)-measurable.  Since  for 
0  6(0,1) 


WQp  II 5  — 


'{uGfl  I  X{oj)>Gxm 


1-0 


(l-0): 


rdP(w)  + 


>^(X) 


FX(GX(P))  -  P 


+ 


FX(GXm-p 


(1  -  0)2  L(1  -  P)(FX(GX(P))  -  Fx(Gx(P))) 


(l-P){Fx{Gx{P))-F-{Gx{P)))\ 


(Fx(Gx(P))  -  Fx(Gx(P))) 


OJ 


1 


+ 


(Fx(Gx(P))  ~  Pf 


< 


1-0  (1  -  PY{Fx{Gxm  -  Fx(Gx{P))) 

1  (1  -p)(Fx(Gx{P))-P) 

1-0  (1  -  0)2(FY(Gx(0))  -  Fx(Gx (0))) 


< 


1 


+ 


FX(GX(P))  ~  Fx(Gx(P)) 


1-0  (1  -  P){FX(GX{P))  -  Fx{Gx(P)))  1  -  0 

and  || || I  =  1,  we  find  that 


J  11^(0)  1 1 2  d\(P)  <  V2  j  dKP)  <  °°- 

Consequently  q  G  M.  and  Q  =  f  q((3)  d\(P)  G  Qx,  which  complete  the  proof.  □ 

We  observe  that  when  f  1/y/l  —  0  dX(P)  =  oo,  there  are  random  variables  lG0  with  7Z(X)  =  oo. 
In  this  case  it  might  not  be  necessary  to  select  q  in  (16)  with  q(P)  G  Qx  for  A-a.e.  0  G  [0, 1)  because 
f  E[Xq(P)]  dX{P)  might  still  be  infinity.  For  the  special  case  of  a  second-order  superquantile  risk 
measure,  we  directly  obtain  the  following  corollary  without  this  complication. 


4.12  Corollary  For  a  G  [0, 1)  and  X  G  C? ,  the  set 


Qx:=cUQeC 


Q  = 


— [  q(P)dp,  i 

1  ^  J  Ot 


q  G  M,  q(P)  G  QX  for  m-a.e.  0  G  [a,  1) 


is  nonempty,  convex,  and  weakly  compact.  Moreover, 


Q  G  Qx  if  and  only  if  Q  is  a  risk  identifier  of  7Za  at  X. 


u 

Further  simplifications  are  possible  in  the  case  of  second-order  superquantile  risk  measures.  As 
usual,  we  interpret  0  times  — oo  as  zero  in  the  following. 
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4.13  Theorem  (further  characterization  of  second-order  superquantile  risk  identifiers)  For  X  £  C? 
and  a  £  [0, 1),  Q %  is  the  closure  of  elements  Q*  £  Qa  given,  for  a.e.  w  £  fl,  by 

WS  [log  +  ffff  rf(u)dp]  if  a  <  f(u)  <  1 

T =5  fa(U)  rP  if  /(w)  <  «  <  F{u) 

0  otherwise, 


QaW=< 


where  r *  £  C?  satisfies  (12)  and  F(u)  :=  Fx(X(u))  and  f(u>)  :=  Fx(X(oj)). 

The  specific  choice  f p  £  C2  given  in  (14)  results  in  the  risk  identifier  Q ^  £  Q ^  having,  for  a.e. 

(J  E  ^2, 


f  T^o  loS  yr 


Qa  (w)  =  < 


1— a 


1 

1— a 
1 

1— a 

o 


F(u) 

1—a 


+  i  + 


1  -F(uj) 


l0g  l-/(w)  ^  F(u)-f(u) 

F(u)—a  !  1— F(u>)  i  1 —F(u 

F(u)-f(w)  +  F(u)-f( w)  i0g  ~TXf 


1  —  F{bj) 


1-. 

’0*0 

f(v) 

a 

if  a  <  f(io)  =  F(u>)  <  1 

if  a  <  f(u)  <  F(u) 

if  f(oj)  <  a  <  F(u)  and  f(u)  <  F(u) 

otherwise. 


Proof.  For  u  £  fl  such  that  a  <  Fx(X(oj))  <  1 

r  i 


?6(a,l)  |  X(u)>Gx(fi)}  1  “  I3 

By  Proposition  4.9, 

1 


Qa  M  = 


1  —  a 
1 

1  —  a 


dP  =  [—  log(l  -P)\ 


dp  + 


Fx(X(  a»)) 


=  log 


1  —  a 


'{/36(a,  1)  I  X(u)>Gx(P)}  1  _  Z3  J{p G(a,l)  I  X{u)=Gx {£)} 


l-Fx(X(u)) 

rf(u>)dp 


(17) 


log 


1  —  a 


rFx(X(oj)) 


+ 


rf(uj)dp 


1-  Fx(X(uj))  J f~ (X(w)) 


which  proves  the  first  claim.  The  second  claim  follows  by  a  similar  argument. 

We  next  turn  to  the  specific  choice  of  fp.  For  a  <  Fx  (X(u))  =  Fx(X(ujj)  <  1,  the  conclusion 
follows  trivially.  For  a  <  Fx  (X(u))  <  Fx(X(u>)),  integration  gives  that 


rFx{X{  u)) 
!F-(X(u)) 


rf(u;)dp  = 


rFx(X(u)) 


Fx(X(u))  -  P 


'F-(xn)  (1  -  P)(Fx(X(u))  -  Fx(X(u))) 


dp 


=  1  + 


l-Fx(X(u>)) 


■log 


l-Fx(X(u>)) 


Fx(X(u))-F-(X(  u))  "1  -F~(X(u)y 

and  the  corresponding  conclusion  follows.  The  last  case  follows  by  a  similar  calculation.  □ 

The  situation  is  especially  simple  for  the  following  case. 

4.14  Corollary  Suppose  that  Fx  is  continuous  for  X  £  C2  and  a  £  [0, 1).  Then,  Q„  is  a  singleton 6 
with  element  given,  for  a.e.  oj  £  fl,  by 


Q 


x{oj)  =  {  T^logi=iOTR)  iftx<Fx(X(u))<  1 


otherwise. 


Again,  uniqueness  is  up  to  on  a  set  of  P-measure  zero. 
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□ 


It  is  obvious  that  expressions  of  risk  identifiers  provide  alternative  expressions  for  risk  measures. 
Specifically,  for  X  G  £2, 


TZ(X)  =  sup 
QeQ 


X(w)Q(w)dP(w) 


X(u)Qx{u)dP(uj), 


for  any  Qx  G  Qx.  In  the  case  of  the  previous  corollary,  it  is  easy  to  see  that  the  second-order 
superquantile  risk  takes  the  simple  form 


na{x)  = 


1  —  a 


l-  a 

/  xlogi — ^r^dFx^' 

/ Gx(a)  1  -  Fx(x) 


where  Gx{oc)  =  — oo  for  a  =  0,  which  complements  the  expression  of  Theorem  3.4. 


5  Applications  to  Optimization  and  Regression 

In  applications  arising  in  optimization  under  uncertainty  and  risk-averse  regression,  one  is  not  only 
interested  in  the  risk  of  a  single  random  variable  X,  but  rather  of  a  parameterized  family  of  random 
variables  over  which  the  “best”  is  to  be  selected  according  to  some  criterion  and  constraints.  When 
the  criterion  and/or  the  constraints  are  given  in  terms  of  measures  of  risk  applied  to  this  family  of 
random  variables,  we  obtain  optimization  problems  involving  parameterized  risk.  Properties  of  these 
measures  of  risk  as  functions  of  the  parameters  as  well  as  formulae  for  the  functions’  (sub)gradients 
become  central.  In  this  section,  we  discuss  optimization  problems  involving  parameterized  mixed  and 
second-order  superquantile  risk.  In  particular,  we  develop  expressions  for  subgradients  relying  on  the 
risk  identifiers  of  Section  4. 

We  consider  a  family  of  random  variables  Xu  =  g(u,-),  u  €  1?",  generated  by  the  function  g  : 
]Rn  xH->£  Consistent  with  the  previous  sections,  we  assume  that  Xu  G  L '?  for  all  u  G  Mn .  For  a 
weighting  measure  A  and  the  corresponding  mixed  superquantile  risk  measure  TZ,  as  before  given  by 

7Z(XU)  =  J  GXu(fi)  dm , 

we  get  a  function 

f{u):=K{Xu ),  u€Mn,  (18) 

representing  parameterized  risk.  One  might  then  proceed  with  determining  a  u  G  Mn  that 

minimizes  f(u)  over  a  subset  of  Mn 

or,  alternatively,  with  determining  auG  Mn  that 

minimizes  some  criterion  function  of  u  subject  to  f(u)  <  0  and  possibly  other  constraints. 

Algorithms  such  as  cutting  plane  and  bundle  methods  for  solving  these  optimization  problems  require 
expressions  for  (sub)  gradients  of  /.  Justification  for  these  approaches  is  provided  by  the  Convexity 
Theorem  of  [19],  which  establishes  that  /  is  convex  whenever  g(-,ut)  is  convex  for  a.e.  wGil. 
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In  the  remainder  of  the  paper,  we  derive  expressions  for  subgradients  of  /,  but  refrain  from  discussing 
full  algorithms;  see  for  example  [12,  10,  24]  for  risk  minimization  algorithms  based  on  dual  approaches 
and  [25]  for  related  subgradient  expressions.  However,  we  end  the  paper  with  a  discussion  of  primal 
and  dual  methods  in  the  context  of  superquantile  regression. 


5.1  Subgradients  of  Parameterized  Risk 

We  restrict  the  attention  to  the  case  with  f  l/yT  —  a  d\(a)  <  oo  which  ensures  the  finiteness  of  1Z  on 
L '?  and  also  the  weak  compactness  of  Q.  We  equip  JRn  x  C2  with  the  product  topology  generated  by 
the  norm  topology  on  lRn  and  the  weak  topology  on  C? .  The  convergence  of  points  in  lRn  x  C?  in  this 
weak  sense  is  denoted  by  — 

For  notational  convenience,  we  let  h  :  lRn  x  £2  ->  E  be  given  by 

h(u,Q)  :=  j  g{u,uj)Q(uj)dF{uj).  (19) 

Properties  of  this  function  are  established  next. 

5.1  Proposition  Consider  h  in  (19)  and  suppose  for  an  open  set  U  C  Mn  that 
(i)  there  exists  an  L  G  £2  such  that 

|  g(u,tJ)  —  g(u\uj)  |  <  L(uj)\\u  —  u'\\  for  all  u,u'  G  F  and  a.e.  w€Sl 


(ii)  for  every  i  =  l,...,n,  there  exists  an  P*  C  P,  with  P{Pj}  =  1,  and  an  Li  G  C2  such  that 
dg(u,uo)/dui  exists  for  u  G  U  and  co  G  P*,  and 


dg(u,u) 

dui 


—  ’ —  <  LAujlllu  —  it'll  for  all  u,u'  G  U  and  cu  G  Pj 

OUi 


(in)  g(v,  -),dg(vl,  -)/dui  G  C2  for  some  v,  vz  G  U ,  i  =  1, ...,  n. 

Then,  h  is  weakly  continuous  on  U  x  C2  and  \7uh  exists  and  is  likewise  weakly  continuous  on  U  x  C2 . 

Proof.  First  we  consider  h,  which  is  well-defined  and  finite  on  U  x  C2  from  assumptions  (i)  and 
(iii).  Suppose  that  ( uv ,QV )  ->w  ( u,Q ),  with  uv ,u  G  U  and  QV,Q  G  C2 .  Then  by  the  triangle  and 
Cauchy-Schwartz  inequalities  and  assumption  (i), 

\h(uu,  Qu)  -  h(u,  Q)\  <  J[g(uu,u)  —  g(u,u)\Qu(u>)dP(u>)  +  J  g(u,u)[Qv(u)  —  Q(u)]dF(uj) 

<  II g(uv,  •)  -  g(u,  OlhllQ^lb  +  J g(u,u)[Qu(uj)  -  Q(u)\dF(u) 
<(f;[L2])1/2||^_u||||q-||2+  J  g(u,u)[Cnu)-Q(  u)}dF(u) 

By  the  Uniform  Boundedness  Principle,  {IIQ^Ihj^i  is  bounded  and  the  hrst  term  therefore  vanishes. 
Since  assumptions  (i)  and  (iii)  imply  that  g(u,ui)  €  C?  for  all  u  G  U,  the  second  term  vanishes  by  the 
weak  convergence  of  Qu  to  Q. 


22 


Second  we  consider  Vu/i.  Following  a  standard  argument  and  the  Dominated  Convergence  Theorem 
(see  for  example  the  proof  of  Theorem  7.44  in  [27]),  we  find  that  for  every  u  €  U  and  Q  £  £?,  Vuh(u ,  Q) 
exists  and  is  given  by 

Vuh(u,Q )  =  J  Vug(u,u)Q(u})dP(u). 

Repeating  the  above  argument  with  g  replaced  by  dg/dui  and  assumption  (i)  by  assumption  (ii)  estab¬ 
lishes  the  claim  about  V„/i.  □ 

In  view  of  Proposition  5.1,  the  following  conclusions  is  a  direct  consequence  of  [23,  Theorem  10.31]. 

5.2  Theorem  (subdifferentiability  of  /)  Suppose  that  the  assumptions  of  Proposition  5.1  holds.  Then, 
f  in  (18)  is  locally  Lipschitz  continuous  on  U  and  strictly  differentiable7  where  it  is  differentiable.  There 
exists  a  set  D  C  U  such  that  U  \  D  is  negligible8,  f  is  differentiable  on  D,  and  the  gradient  Vf  is 
continuous  relative  to  the  set  D. 

Moreover,  the  directional  derivative  of  f  at  u  £  U  in  direction  v  £  lRn  is 
df(u)  ( v )  =  max  j  (E  [Vug{u,  -)Q\,v)  Q  £  Q9^  j 

and  the  subdifferential  of  f  at  u  €  U  is 

df(u )  =  con  { E  { Vug(u , -)Q]  |  Q  £  Q9^  }  , 
where  Q9(u’'l  is  given  in  (16)  with  X  replaced  by  g(u,  •). 

□ 

We  observe  that  when  A  =  Aa,  i.e.,  the  focus  is  on  a  second-order  superquantile  risk  measure 
77q,  then  Q9(u’d  is  fully  characterized  by  Theorem  4.13.  In  particular,  the  latter  half  of  that  theorem 
provides  a  specific  risk  identifier  Q  £  Q9(<v)  that  is  easily  calculated  when  f 1  has  finite  cardinality.  Such 
a  risk  identifier  then  provides  the  subgradient  E[Vug(u,  -)Q]  of  /,  which  also  is  easily  calculated  in  this 
case. 

5.2  Application  to  Superquantile  Regression 

Superquantile  regression  as  laid  out  in  [16]  (see  also  [14])  resembles  quantile  regression,  but  instead  of 
estimating  conditional  quantiles,  it  focuses  on  conditional  superquantiles.  Specifically,  we  find  that  for 
Y  £  C2  and  a  £  (0, 1), 

{Gy(a)}  =  argmin£a(T  —  uq),  where  £a(Y)  :=  Va(Y)  —  E[Y ] 
uq£R 

7Recall  that  /  :  Rn  — »  IR  is  strictly  differentiable  at  a  point  x  if  f{x)  is  finite  and  there  is  a  vector  v  £  ]Rn  such  that 
(f(x')  —  f(x)  —  ( v,x '  —  x))/\x'  —  x\  — >  0  whenever  x,x'  —>  x  and  x'  ^  x;  see  [23,  Definition  9.17]. 

8  A  subset  of  a  set  of  Lebesgue  measure  zero  is  negligible. 
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is  a  measure  of  error  given  in  terms  of  the  measure  of  regret  9 

i 

Va(Y)  :=  — 1 1 —  /max{0,  GY(/3)}d/3. 

1  -  a  J 
o 

In  the  same  manner  as  minimizing  mean-squared  error  yields  an  expectation  and  the  foundation  for 
least-squares  regression,  and  minimizing  a  Koenker-Basset  error  yields  a  quantile  and  the  foundation 
for  quantile  regression,  minimizing  £a  leads  to  superquantile  regression. 

Superquantile  regression  deals  with  the  problem  of  approximating  a  random  variable  Y  €  C2  by  a 
combination  of  more  accessible  random  variables  Xi,X%, . . . ,  Xn  €  C2,  such  that  the  error  as  quantified 
by  £a  is  minimized.  Hopefully,  the  knowledge  of  X  =  (X\, Xn)  would  then  provide  reasonably 
accurate  predictions  of  Y .  Limiting  the  scope  to  affine  regression  functions,  superquantile  regression 
then  needs  to  solve  the  problem 

min  £a  (Y  -  [«o  +  («,  *>]) 

uo£R;u£R 

to  obtain  regression  coefficients  uq  and  u.  That  is,  the  regression  coefficients  (uq,u)  are  selected  such 
that  the  error  between  Y  and  the  model  no  +  (n,  X)  is  minimized. 

We  show  in  [16]  that  this  problem  can  be  decomposed  into  the  two  problems 

1  f1- 

(i)  find  u  €  argmin  — —  /  GgM(P)d£  -  E[g(u,  •)]  and  (ii)  find  u0  =  Gg^.)(a), 
uGR ”  J-  a  Ja 

where  for  each  u  €  Mn, 

g(u,  •)  =  Y  -  (u,X) 

is  a  random  variable  defined  on  the  sample  space  fl  =  J?n+ 1 ,  with  sigma-algebra  BRn+ 1,  and  probability 
P  given  by  the  distribution  of  ( X ,  Y).  The  problem  (i)  is  that  of  minimizing  a  second-order  superquantile 
of  g(u ,  •)  minus  the  expectation  of  g(u,  •).  Since  E\g(u,  •)]  =  E[Y]  —  (n,  E[X])  is  a  deterministic  quantity, 
this  problem  is  essentially  in  the  form  discussed  earlier  in  the  section:  to  minimize  a  mixed  superquantile 
risk  measure,  in  fact  a  second-order  superquantile  risk  measure. 

Suppose  that  the  distribution  P  is  supported  on  the  points  {(x2 ,y2)}j=1  C  ]Rn+l  with  P{( x2 ,y2)}  = 
p1 ,  j  =  1,  ...,zq  as  is  the  case  in  practice  when  the  regression  relies  on  the  observed  data  {(x2 ,yJ)}j=1. 
Then,  the  evaluation  at  a  given  u  €  Mn  of  the  objective  function 

fin)  =  — f1  GgM(P)dp  -  E[g(u,  •)] 

2  ®  J  a 

of  problem  (i)  and  a  corresponding  subgradient  are  achieved  as  follows:  Determine  the  cumulative 
distribution  function  of  g(u,  •)  and  use  the  formula  in  the  second  half  of  Theorem  4.13,  with  X  replaced 
by  g(u,  •),  to  determine  a  risk  identifier  Q^u’  K  This  computation  can  be  obtained  in  0{y  logzz)  time, 
with  sorting  of  { y 2  —  (u,  x-7)}t'=1  to  obtain  the  cumulative  distribution  function  being  the  bottleneck. 
Then,  in  view  of  Theorem  5.2,  the  function  value  and  a  subgradient  are  readily  available  through 

V  V 

f(u)  =  y 3  -  (uix3))Q9aU,')(U22)  -  ^p’iy0  -  (u,  X2)) 

3= 1  i=1 

9We  refer  to  [19]  for  a  general  treatment  of  measures  of  error  and  regret. 
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and 


V  V 

V f(u)  =  —pix9Q9^u,'\u9)  +  where  uP  =  (x9  ,y9). 

i=i  j=i 

We  note  that  the  assumptions  of  Proposition  5.1  are  easily  verified  in  this  case  due,  in  part,  to  the 
affine  form  of  g(-,u).  Consequently,  each  iteration  of  a  cutting-plane  method  or  bundle  method  requires 
therefore  computational  time  of  order  0(v  log  v)  as  a  function  of  the  number  of  data  points.  The  number 
of  iterations  needed  would  depend  on  the  method,  n  (the  number  of  explanatory  variables),  and  other 
factors.  In  comparison,  a  “primal”  method  proposed  in  [16]  for  solving  the  same  problem  requires  the 
solution  of  a  linear  program  with  n  +  0(z/2)  variables  and  0{y2)  inequality  constraints.  It  is  therefore 
clear  that  for  small  n  and  large  v,  which  is  typical  in  regression  problems,  a  dual  method  relying  on  the 
expressions  derived  in  this  paper  might  outperform  the  linear-programming-based  approach.  In  fact, 
even  storage  of  the  linear  program  becomes  challenging  for  large  v. 
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